Conversation
Add new array functions from upstream DataFusion v53: array_any_value, array_distance, array_max, array_min, array_reverse, arrays_zip, string_to_array, and gen_series. Add corresponding list_* aliases and missing list_* aliases for existing functions (list_empty, list_pop_back, list_pop_front, list_has, list_has_all, list_has_any). Also add array_contains/list_contains as aliases for array_has, generate_series as alias for gen_series, and string_to_list as alias for string_to_array. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests cover all functions and aliases added in the previous commit: array_any_value, array_distance, array_max, array_min, array_reverse, arrays_zip, string_to_array, gen_series, generate_series, array_contains, list_contains, list_empty, list_pop_back, list_pop_front, list_has, list_has_all, list_has_any, and list_* aliases for the new functions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…comment - Make null_string optional in string_to_array/string_to_list - Make step optional in gen_series/generate_series - Rename second_array to element in array_contains/list_has/list_contains - Restore # Window Functions section comment in __all__ - Add tests for optional parameter variants Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reduce 26 individual tests to 14 test functions with parametrized cases, eliminating boilerplate while maintaining full coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…block Merge standalone tests for list_empty, list_pop_back, list_pop_front, list_has, array_contains, list_contains, list_has_all, and list_has_any into the existing parametrized test_array_functions block alongside their array_* counterparts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use the richer multi-row dataset (including all-nulls case) for both array_any_value and list_any_value via the parametrized test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5b592dc to
ef48dd9
Compare
There was a problem hiding this comment.
Pull request overview
This PR exposes several upstream DataFusion array/list scalar functions and aliases through the datafusion-python API, and adds Python unit tests to validate the new bindings and aliases (closing #1452).
Changes:
- Added Python API exports and wrappers for new array/list functions and
list_*aliases (e.g.,array_any_value,array_distance,array_max/min,array_reverse,arrays_zip,string_to_array,gen_series, pluslist_*aliases). - Added Rust pyo3 bindings for newly exposed functions that weren’t previously available in the Python extension module.
- Expanded unit test coverage to exercise new functions and alias behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
python/datafusion/functions.py |
Adds new public function exports (__all__) and Python-level wrappers/aliases for array/list functions. |
crates/core/src/functions.rs |
Adds pyo3 bindings for new DataFusion nested functions/UDFs and registers them in the Python extension module. |
python/tests/test_functions.py |
Adds unit tests for new functions and alias coverage in both the general array-function parametrized suite and targeted tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
These aliases match the upstream DataFusion SQL-level aliases, completing the set of missing array functions from issue apache#1452. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ntjohnson1
left a comment
There was a problem hiding this comment.
Two small things otherwise LGTM.
I didn't check the upstream details at all and assumed that since the python api tests looked reasonable, and things compile things are well aligned.
I didn't cross things off one by one but did a general check that this resolved the items listed in the original issue (original issue didn't call out list_overalp but reasonable to include)
|
|
||
| Any parts matching the optional ``null_string`` will be replaced with ``NULL``. | ||
|
|
||
| Examples: |
There was a problem hiding this comment.
Doesn't demonstrate the optional parameter.
We can probably update the copilot rules that functions should have examples that cover base functionality and extra examples for optional arguments.
|
|
||
| Unlike :py:func:`range`, this includes the upper bound. | ||
|
|
||
| Examples: |
There was a problem hiding this comment.
Missing optional parameter example
…_series Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
||
| ## Python Function Docstrings | ||
|
|
||
| Every Python function must include a docstring with usage examples. | ||
|
|
||
| - **Examples are required**: Each function needs at least one doctest-style example | ||
| demonstrating basic usage. | ||
| - **Optional parameters**: If a function has optional parameters, include separate | ||
| examples that show usage both without and with the optional arguments. Pass | ||
| optional arguments using their keyword name (e.g., `step=dfn.lit(3)`) so readers | ||
| can immediately see which parameter is being demonstrated. | ||
| - **Reuse input data**: Use the same input data across examples wherever possible. | ||
| The examples should demonstrate how different optional arguments change the output | ||
| for the same input, making the effect of each option easy to understand. | ||
| - **Alias functions**: Functions that are simple aliases (e.g., `list_sort` aliasing | ||
| `array_sort`) only need a one-line description and a `See Also` reference to the | ||
| primary function. They do not need their own examples. |
There was a problem hiding this comment.
@ntjohnson1 Do you think we should add anything else here?
There was a problem hiding this comment.
Looks like it covers the majority of things to me. The only other piece is the stylistic preference on specifying "Returns" or not. I don't know if there is a definitive position on that.
Yes, and I've updated the skill that generated the original issue because we had both some false positives and false negatives. |
|
Thanks for the review @ntjohnson1 ! |
Which issue does this PR close?
Closes #1452
Rationale for this change
These features are available upstream but not exposed to the python API.
What changes are included in this PR?
Add python API
Add unit tests
Are there any user-facing changes?
Addition only.